Accurate continuous geographic assignment from low- to high-density SNP data
نویسندگان
چکیده
MOTIVATION Large-scale genotype datasets can help track the dispersal patterns of epidemiological outbreaks and predict the geographic origins of individuals. Such genetically-based geographic assignments also show a range of possible applications in forensics for profiling both victims and criminals, and in wildlife management, where poaching hotspot areas can be located. They, however, require fast and accurate statistical methods to handle the growing amount of genetic information made available from genotype arrays and next-generation sequencing technologies. RESULTS We introduce a novel statistical method for geopositioning individuals of unknown origin from genotypes. Our method is based on a geostatistical model trained with a dataset of georeferenced genotypes. Statistical inference under this model can be implemented within the theoretical framework of Integrated Nested Laplace Approximation, which represents one of the major recent breakthroughs in statistics, as it does not require Monte Carlo simulations. We compare the performance of our method and an alternative method for geospatial inference, SPA in a simulation framework. We highlight the accuracy and limits of continuous spatial assignment methods at various scales by analyzing genotype datasets from a diversity of species, including Florida Scrub-jay birds Aphelocoma coerulescens, Arabidopsis thaliana and humans, representing 41-197,146 SNPs. Our method appears to be best suited for the analysis of medium-sized datasets (a few tens of thousands of loci), such as reduced-representation sequencing data that become increasingly available in ecology. AVAILABILITY AND IMPLEMENTATION http://www2.imm.dtu.dk/∼gigu/Spasiba/ CONTACT [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
منابع مشابه
Accurate continuous geographic assignment 2 from low - to high - density SNP data
1 Accurate continuous geographic assignment 2 from low-to high-density SNP data.
متن کاملImputation of parent-offspring trios and their effect on accuracy of genomic prediction using Bayesian method
The objective of this study was to evaluate the imputation accuracy of parent-offspring trios under different scenarios. By using simulated datasets, the performance Bayesian LASSO in genomic prediction was also examined. The genome consisted of 5 chromosomes and each chromosome was set as 1 Morgan length. The number of SNPs per chromosome was 10000. One hundred QTLs were randomly distributed a...
متن کاملUsing RNA-Seq for Genomic Scaffold Placement, Correcting Assemblies, and Genetic Map Creation in a Common Brassica rapa Mapping Population
Brassica rapa is a model species for agronomic, ecological, evolutionary, and translational studies. Here, we describe high-density SNP discovery and genetic map construction for a B. rapa recombinant inbred line (RIL) population derived from field collected RNA sequencing (RNA-Seq) data. This high-density genotype data enables the detection and correction of putative genome misassemblies and a...
متن کاملAssessment of the completeness of Volunteered Geographic Information focusing on building blocks data (Case Study: Tehran metropolis)
Open Street Map (OSM) is currently the largest collection of volunteered geographic data, widely used in many projects as an alternative to or integrated with authoritative data. However, the quality of these data has been one of the obstacles to the widely use of it. In this article, from among the elements related to the quality of volunteered geographic data, we have tried to examine the com...
متن کاملEnhancing healthcare accessibility measurements using GIS: A case study in Seoul, Korea
With recent aging demographic trends, the needs for enhancing geo-spatial analysis capabilities and monitoring the status of accessibilities of its citizens with healthcare services have increased. The accessibility to healthcare is determined not only by geographic distances to service locations, but also includes travel time, available modes of transportation, and departure time. Having acces...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Bioinformatics
دوره 32 7 شماره
صفحات -
تاریخ انتشار 2016